skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Kumar, Arun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. BackgroundLow back pain (LBP) is a significant public health problem that can result in physical disability and financial burden for the individual and society. Physical therapy is effective for managing LBP and includes evaluation of posture and movement, interventions directed at modifying posture and movement, and prescription of exercises. However, physical therapists have limited tools for objective evaluation of low back posture and movement and monitoring of exercises, and this evaluation is limited to the time frame of a clinical encounter. There is a need for a valid tool that can be used to evaluate low back posture and movement and monitor exercises outside the clinic. To address this need, a fabric-based, wearable sensor, Motion Tape (MT), was developed and adapted for a low back use case. MT is a low-profile, disposable, self-adhesive, skin-strain sensor developed by spray coating piezoresistive graphene nanocomposites directly onto commercial kinesiology tape. ObjectiveThe objectives of this study were to (1) validate MT for measuring low back posture and movement and (2) assess the acceptability of MT for users. MethodsA total of 10 participants without LBP were tested. A 3D optical motion capture system was used as a reference standard to measure low back kinematics. Retroreflective markers and a matrix of MTs were placed on the low back to measure kinematics (motion capture) and strain (MT) simultaneously during low back movements in the sagittal, frontal, and axial planes. Cross-correlation coefficients were calculated to evaluate the concurrent validity of MT strain in reference motion capture kinematics during each movement. The acceptability of MT was assessed using semistructured interviews conducted with each participant after laboratory testing. Interview data were analyzed using rapid qualitative analysis to identify themes and subthemes of user acceptability. ResultsVisual inspection of concurrent MT strain and kinematics of the low back indicated that MT can distinguish between different movement directions. Cross-correlation coefficients between MT strain and motion capture kinematics ranged from –0.915 to 0.983, and the strength of the correlations varied across MT placements and low back movement directions. Regarding user acceptability, participants expressed enthusiasm toward MT and believed that it would be helpful for remote interventions for LBP but provided suggestions for improvement. ConclusionsMT was able to distinguish between different low back movements, and most MTs demonstrated moderate to high correlation with motion capture kinematics. This preliminary laboratory validation of MT provides a basis for future device improvements, which will also involve testing in a free-living environment. Overall, users found MT acceptable for use in physical therapy for managing LBP. 
    more » « less
  2. Large models such as GPT-3 and ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. Such models must be trained on multiple GPUs due to their size and computational load, driving the development of a bevy of model parallelism techniques and tools. Navigating suchparallelismchoices, however, is a new burden for DL users such as data scientists, domain scientists, etc., who may lack the necessary systems knowhow. The need formodel selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens:resource apportioningandscheduling.In this work, we unify these three burdens by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and Schedule. We propose a new information system architecture to tackle the SPASE problem holistically, exploiting the performance opportunities presented by joint optimization. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than current DL practice. 
    more » « less
  3. Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that provide higher efficiency and schemes to reduce costs. However, we observe that many such systems basically reinvent the wheel of much work done in the database world on scalable graph analytics engines. Further, they often tightly couple the scalability treatments of graph data processing with that of GNN training, resulting in entangled complex problems and systems that often do not scale well on one of those axes. In this paper, we ask a fundamental question: How far can we push existing systems for scalable graph analytics and deep learning (DL) instead of building custom GNN systems? Are compromises inevitable on scalability and/or runtimes? We propose Lotan, the first scalable and optimized data system for full-batch GNN training withdecoupled scalingthat bridges the hitherto siloed worlds of graph analytics systems and DL systems. Lotan offers a series of technical innovations, including re-imagining GNN training as query plan-like dataflows, execution plan rewriting, optimized data movement between systems, a GNN-centric graph partitioning scheme, and the first known GNN model batching scheme. We prototyped Lotan on top of GraphX and PyTorch. An empirical evaluation using several real-world benchmark GNN workloads reveals a promising nuanced picture: Lotan significantly surpasses the scalability of state-of-the-art custom GNN systems, while often matching or being only slightly behind on time-to-accuracy metrics in some cases. We also show the impact of our system optimizations. Overall, our work shows that the GNN world can indeed benefit from building on top of scalable graph analytics engines. Lotan's new level of scalability can also empower new ML-oriented research on ever-larger graphs and GNNs. 
    more » « less
  4. Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP. 
    more » « less
  5. Deep learning (DL) is revolutionizing many fields. However, there is a major bottleneck for the wide adoption of DL: the pain of model selection , which requires exploring a large config space of model architecture and training hyper-parameters before picking the best model. The two existing popular paradigms for exploring this config space pose a false dichotomy. AutoML-based model selection explores configs with high-throughput but uses human intuition minimally. Alternatively, interactive human-in-the-loop model selection completely relies on human intuition to explore the config space but often has very low throughput. To mitigate the above drawbacks, we propose a new paradigm for model selection that we call intermittent human-in-the-loop model selection . In this demonstration, we will showcase our approach using five real-world DL model selection workloads. A short video of our demonstration can be found here: https://youtu.be/K3THQy5McXc. 
    more » « less
  6. Abstract Background Hip-worn accelerometer cut-points have poor validity for assessing children’s sedentary time, which may partly explain the equivocal health associations shown in prior research. Improved processing/classification methods for these monitors would enrich the evidence base and inform the development of more effective public health guidelines. The present study aimed to develop and evaluate a novel computational method (CHAP-child) for classifying sedentary time from hip-worn accelerometer data. Methods Participants were 278, 8–11-year-olds recruited from nine primary schools in Melbourne, Australia with differing socioeconomic status. Participants concurrently wore a thigh-worn activPAL (ground truth) and hip-worn ActiGraph (test measure) during up to 4 seasonal assessment periods, each lasting up to 8 days. activPAL data were used to train and evaluate the CHAP-child deep learning model to classify each 10-s epoch of raw ActiGraph acceleration data as sitting or non-sitting, creating comparable information from the two monitors. CHAP-child was evaluated alongside the current practice 100 counts per minute (cpm) method for hip-worn ActiGraph monitors. Performance was tested for each 10-s epoch and for participant-season level sedentary time and bout variables (e.g., mean bout duration). Results Across participant-seasons, CHAP-child correctly classified each epoch as sitting or non-sitting relative to activPAL, with mean balanced accuracy of 87.6% (SD = 5.3%). Sit-to-stand transitions were correctly classified with mean sensitivity of 76.3% (SD = 8.3). For most participant-season level variables, CHAP-child estimates were within ± 11% (mean absolute percent error [MAPE]) of activPAL, and correlations between CHAP-child and activPAL were generally very large (> 0.80). For the current practice 100 cpm method, most MAPEs were greater than ± 30% and most correlations were small or moderate (≤ 0.60) relative to activPAL. Conclusions There was strong support for the concurrent validity of the CHAP-child classification method, which allows researchers to derive activPAL-equivalent measures of sedentary time, sit-to-stand transitions, and sedentary bout patterns from hip-worn triaxial ActiGraph data. Applying CHAP-child to existing datasets may provide greater insights into the potential impacts and influences of sedentary time in children. 
    more » « less
  7. null (Ed.)
    Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries , we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different . We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements. 
    more » « less